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SMOOTHING  3-D  DATA  FOR  TORPEDO  PATHS 


I.  THE  GENERAL  PROBLEM 


A.  Data 


Data  in  the  form  of  ordered  quadruplets  (t.,  x.,  y.,  and  z.)  are 
available  from  3-D  files  on  torpedo  and  target  paths.  The  times  t.  are  suffic- 
iently accurate  so  that  they  can  be  assumed  to  be  without  errors.  The  spatial  co- 
ordinates  x.,  v.,  and   z  .  ,  however,  are  not  only  subject  to  measurement   errors, 

1      '  1  1  J 

but   also   may   contain   erratic  measurements    or  have   measurements   missing      for      some 
of    the   equally   spaced   time    intervals. 


s.  Desired   Output 

Information  to  be  extracted  from  this  data  can  be  obtained  either  as: 

(1)  smoothed  information  as  a  function  of  time  (parametric  form),  or 

(2)  smoothed  information  at  a  particular  sequence  of  times  which  can  be 
specified. 

A  comparison  of  computational  requirements  of  the  two  orocedures  will  involve  the  length 
of  intervals  used  in  smoothing  and  the  number  of  times  in  the  sequence  of  times  of 
interest.    Both  procedures  involve  the  same  smoothing  techniques. 

The  information  to  be  extracted  from  the  3-D  data  includes: 

(1)    smoothed  position  coordinates 


(a)    as  functions  of  time  (i.e..  x=f  (t),  v=f  (t),  z=t  (t)) 

a  y  l. 


(b)    at  specified  times  t-  (i.e.,  x(t-),  y(tj),  z(t-)), 
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(3)  velocity  component  estimates 

(a)  as  functions  of  time  (i.e.,  V  (t),  V  (t),  V  ft)) 

x  y  z 

(b)  at  specified  times  t.  (i.e.,  V  ft.),  V  (t.),  V  (t.)), 

c  1  x    1       y   i       z    i 

(4)  relative  torpedo  and  target  geometry  in  vicinitv  of  intercept. 

C.  Data  Sample 

The  path  of  the  torpedo  involves  maneuvers  so  that  segments  must  be 
selected  for  applications  of  the  smoothing  technique.  The  lengths  of  the  segments,  and 
hence  the  number  of  possible  data  points,  is  open  to  selection.  Curves  to  be  used  to  fit 
the  data  will  primarily  be  polvnomials.  Longer  path  segments  will  generally  require  higher 
order  polynomials  and  be  more  difficult  to  fit  with  acceptably  small  residuals.  On  the 
other  hand,  short  intervals  contain  fewer  data  points  and  can  limit  capability  for  reducing 
prediction  errors— the  trade-off  must  be  resolved  by  considering  potential  paths,  and 
measurement  errors.  Some  indication  will  be  presented  in  subsequent  sections  of  this 
report  where  data  for  a  specific  torpedo  path  is  analyzed.  Initially,  two  sample  sizes 
(n=ll  and  n=21)  are  considered. 

One  of  the  questionable  features  for  small  sample  sizes  is  possible  further 
reduction  by  deletion  of  data  points  which  appears  inconsistent  with  the  remaining  data. 
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II.  DATA  SMOOTHING 

A.  Methodology 

The  data  smoothing  considered  in  this  report  is  limited  to  the  method  of 
least  squares.  Other  methods  such  as  Kalman  filtering  would  be  appropriate  for  real  time 
data  smoothing  where  interest  is  centered  on  the  next  data  point  following  the  data  used 
in  the  smoothing,  but  the  current  status  of  the  method  is  not  appropriate  for 
postexperimental  application  where  times  within  the  data  sample  are  of  interest. 

The  data  smoothing  techniques  currently  used  at  IVPS  involve  the  least 
squares  method  with  the  following  equations: 

(1)  x(t)  =  a  +   bt  (linear) 

(2)  x(t)  =  a   +   bt   +   ct  (quadratic,  parabolic) 

(3)  x(t)  =  a   +   bin  (t)   (logarithmic). 

This  report  concentrates  on  the  addition  of  higher  order  polynomials,  in  particular: 

0  ^ 

(4)  x(t)  =  a    +  a.t  +  a0t4-a„tJ  (cubic) 

ol  2         6 

(5)  x(t)  =  a^  +  a,t  +  a0t2  +  a0t3  +  a.t4  (quartic). 

O  1  L  o  4 

The  linear  least  squares  technique  is  described  in  Appendix  A.  The  sum  of 
squares  of  the  residuals 


N        ?  n      / 


provides  a  basis  for  selection  of  the  particular  equation  to  be  used  in  fitting  a  particular 
set  of  data.   The  statistic 

S2  =  D/(n-k), 

e 
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where  n  is  the  number  of  points  in  the  sample  and  k  is  the  number  of  parameters  in  the 
equation,  provides  an  estimate  of  the  variance  of  measurement  errors. 

B.  Sequential  Differences 

A  preliminary  screening  of  sample  data  by  successive  differences  can  serve 
a  dual  purpose: 

(1)  indication  of  the  order  of  the  polynomial  required  to  produce  a 
reasonable  fit,  and 

(2)  indication  of  isolated  wild  data  points  (outliers^. 

The  first  through  fourth  successive  differences  are  presented  in  Table  1  when  the  actual 
relationship  of  x  to  t  is  linear  and  in  Table  2  when  the  relationship  is  quadratic.  A 
perturbation  d  is  introduced  in  x~. 

There  are  several  salient  features  of  successive  differences  that  should  be 
noted: 

(1)  Ignore,  for  the  moment,  the  perturbation  in  x^.  In  Table  1,  the  first 
differences  (the  A  ,  -s)  consist  of  the  velocitv  term  a,  plus  noise.  If  a-,  is  large  with 
respect  to  the  noise  (the  n-'s),  these  differences  will  all  have  the  same  sign.  The  second 
differences  (the  A  9:'s);  however,  involve  onlv  noise  and  their  signs  should  be  random. 
This  change  from  consistent  signs  for  the  A « .'s  to  random  signs  for  the  A  9-?s  is  an 
indication  that  a  linear  relationship  of  x  to  t  is  appropriate. 

In  passing,  it  should  be  noted  that: 

At-|     S      Au   =  ai  +  („6-n0)/6, 
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and  that    A  i  is  normally  distributed,  i.e. 


A-  ~~  N(a1} zl    )• 

18 


It  should  also  be  noted  that  if  a,,  is  not  large  with  respect  to  <r,  the  signs 
of  the  A  y's  can  still  have  the  sign  of  a.  with  the  dominance  of  this  sign  depending  upon 
the  relative  magnitudes  of  a^  and         <r. 

Next,  consider  the  quadratic  case  (Table  2).  The  A  o:'s  having  random 
signs  and  the  A  9:'s  are  dominated  by  the  sign  of  aOJ  and  hence  the  quadratics  are 
indicated  as  the  appropriate  polynomial.  Note  that  the  signs  of  the  A  *  -'s  may  also  be 
the  same  for  all  i  if  a.  and  a2  have  the  same  sign.  If  a,  and  a9  have  opposite  signs  and  a^ 
is  greater  than  a0  then  there  can  be  a  change  in  the  sign  of  the  Anj's  where  a1  +  (i  - 
(  i  -1)  )  a9  changes  sign.  In  the  vicinity  of  this  point  the  n.'s  can  become  significant  and 
produce  some  random  sign  terms. 


Higher  order  differences  are  required  to  deal  with  higher  order  polynomials. 
In  general,  random  signs  in  (k+1)  st  order  differences  and  consistent  signs  in  k  '    order 

differences    indicate   selection   of   a    (k+l)st   order   polynomial    to    fit    the    data. 


(2)  The  perturbation  d  was  included  to  provide  an  examination  of  the  effect 
of  an  isolated  outlier  on  successive  differences.  For  illustrative  purposes,  it  will  be 
assumed  that  a  successive  difference  greater  than  three  times  the  standard  deviation  of 
the  noise  in  that  difference  will  be  considered  as  an  indication  that  a  perturbation  exists. 
The  value    »  =4  will  also  be  used  for  illustrative  purposes. 

Now,  note  the  entries  in  the  lower  part  of  Table  1.  Unless  a1  is  known  (or 
estimated)  a  critical  magnitude  for  the  A  ti's  cannot  be  specified.  For  higher  order 
differences  the  ith  difference  of  the   j      order  (  A     .)  has  a  normal^distribution. 


Aji-N   V'Vj' 
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•     Where  k»  is  the  coefficient  of  d  in     A  «.    If  d  =0  then: 

A  ^   ~    N(0,       .2     } 
J 

The  situation  is  an  application  of  statistical  hypothesis  testing.   If     A  -;  is  larger  than  can 
be   expected   due   to   noise   alone,   then   the   presence   of  a   perturbation   (an  outlier)  is 
indicated.     The  critical  magnitude  using  assumptions  of   l.Q-0.99  =  0.01  as  significance 
level  and      <r  =4  is  presented  in  the  last  row  of  Table  1.   Thus  if  I   A  „.  I  >  17,  |   A  -.  ] 
y  18,  or    I    A    ..  I      y     17,  for  any  i,  then  an  outlier  is  indicated. 

Note  that  the  value  *  =4  was  assumed  for  this  illustration.  If  sequential 
differences  are  used  for  preliminary  screening  before  least  squares  curve  fitting  is 
performed,  the  estimate  Sp  for  °"  will  not  be  available.  A  value  of  ?  may  be  assumed 
from  prior  information  of  measurement  errors  but  for  purposes  of  preliminary  screening 
some  value  greater  than  4  would  permit  elimination  of  data  points  with  large 
perturbations. 

It  should  be  emphasized  that  the  above  discussion  pertains  to  the  simplest 
situations.  For  applications  where  there  are  missing  data  points,  or  where  perturbations 
are  not  isolated,  more  guidance  will  be  required.  The  assumption  that  the  noise 
components  (the  n,rs)  are  independent  and  have  the  same  variance,  also  warrants 
reservations  in  applications  of  the  models. 
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III.  APPLICATION 

A.  Sample  Data 

A  specific  test  in  which  a  torpedo  was  launched  against  a  submarine  at  the 
Naval  Undersea  Warfare  Engineering  facilities  will  be  used  for  illustration.  The  3-D  data 
includes  equally  spaced  times  from  814  to  1000— verv  few  data  Doints  are  missing. 
Figure  1  shows  the  torpedo  path  with  every  fifth  point.  Segments  of  this  torpedo  path  are 
selected  for  application  of  the  methodology  presented  in  Section  II.  The  presentation  is 
restricted  to  the  x  and  y  coordinates. 

B.  Data  Sample  I 

The  initial  21  points  (814-834)  appear  to  lie  in  a  straight  line  in  Figure  1  and 
were  selected  as  the  first  data  sample.   This  data  is  presented  in  Figure  2  and  Table  3. 

(1)  Successive  differences: 

The  first  and  second  order  successive  differences  are  also  presented  in 
Table  3.  For  the  x  component,  all  the  first  differences  are  negative  and  the  second 
differences  appear  random  (except  possibly  for  the  tail  of  the  sample  where  a  sequence  of 
four  pluses  occur  including  one  value  (A  9  -,^  =  17.2)  which  is  large  enough  so  that  it  might 
indicate  an  outlier).  The  alternating  signs,  (-,  +,  -  or  +,  -,  +)  are  not  present  so  an  isolated 
outlier  does  not  appear  likely. 

For  the  y  component,  all  the  first  order  successive  differences  are  oositive 
and  the  second  order  differences  appear  somewhat  random.  Again,  A  «  -i  j  -  -13.2 
indicates  that  something  has  occurred  in  the  vicinity  of  t,g.  Higher  order  differences 
were  not  explored  for  this  sample. 

(2)  Least  squares  smoothing: 

Both  linear  and  quadratic  functions  were  fitted  using  the  least  squares 
method  outlined  in  Appendix  A.   The  results  are  presented  below: 
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Table  3.      Successive  Differences  —  Sample  I 


t. 

1 

x. 

l 

An 

A2i 

y. 

An 

A9i 

CI 

1 

5228.6 

-71.8 

-3465.1 

+  58.1 

2 

5156.8 

-71.7 

+0.1 

-3407.0 

+60.8 

+2.7 

3 

5085.1 

-68.8 

+2.9 

-3346.2 

+61.1 

+0.3 

4 

5013.3 

-74.1 

-5.3 

-3285.1 

+62.9 

+1.8 

5 

4944.2 

-66.1 

+8.0 

-3222.2 

+56.6 

-6.3 

6 

4873.1 

-78.1 

-12.0 

-3165.5 

+59.8 

+3.2 

7 

4800.00 

-68.3 

+9.3 

-3105.8 

+56.1 

-3.7 

3 

4731.7 

-79.5 

-11.2 

-3049.7 

+62.5 

+6.4 

9 

4652.2 

-68.6 

+9.9 

-2987.2 

+56.4 

-6.1 

10 

4583.6 

-72.9 

-4.3 

-2930.8 

+60.2 

+3.9 

11 

4510.7 

-70.5 

+2.4 

-2870.5 

+59.7 

-0.6 

12 

4440.2 

-73.2 

-2.7 

-2810.8 

+60.8 

+  1.1 

13 

4367.0 

-70.0 

+3.2 

-2750.0 

+60.0 

-0.8 

14 

4297.0 

-70.9 

-0.9 

-2690.0 

+63 . 3 

+3.3 

15 

4226.1 

-72.5 

-1.6 

-2626.7 

+55.1 

-8.2 

16 

4153.6 

-69.6 

+2.9 

-2571.6 

+69.0 

+4.9 

17 

4084.0 

-66.3 

+3.3 

-2511.6 

+62.5 

+2.5 

13 

4017.7 

-49.1 

+17.2 

-2449.1 

+44.3 

-18.2 

19 

3968.6 

-44.0 

+5.1 

-2404.3 

+47.7 

+3.4 

20 

3924.6 

-56.6 

-12.6 

-2357.1 

+48.0 

+0.3 

21 

3868.0 

-2309.1 
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Linear 


x(t)  =  5288.3    -69.78t  S       =  16.73 

xe 


y(t)  =  -3518.1  +  58.72t  3      =     8.33 

y™ 


Quadratic 


x(t)  =  5318.6   -77. 67t  +  0.3588t2  S       =11.62 

xe 


<y 


y(t)  =  -3532.0  +  62. 33t  -  0.1642t"  S      =     6.30 


The  residual  deviations: 


/\ 


e  •  =  x-  -  x(t-) 

XI  1  r 


e  .  =  v-  -  v(t-) 
yi     yi     -     i 


are  shown  in  Figure  3.  Note  that  there  is  a  definite  trend  in  these  residuals  starting  about 
time  t,g.  Note  also  the  general  trend  of  the  residuals  with  a  small  random  oattern 
superimposed  on  a  curve  for  each  residual  set.  Higher  order  polynomials  could  be  used  to 
remove  the  general  curve  (this  was  not  explored).  Note,  further,  that  no  violent  outliers 
are  indicated.  The  fitted  linear  function  is  shown  in  Figure  2  and  the  observed  and 
predicted  values  for  x.  and  y.  are  presented  in  Tables  4a  and  4b  together  with  the  residuals 
in  these  components  and  the  deviation 


/     *> 

2 

d.  = 

a  /  e  . 

+ 

e  . 

i 

\       xi 

VI 

The  sequences  of  signs  observed  in  Table  4a  for  the  e  .'s  and  e  -'s  are  of 
interest.  There  is  a  sequence  of  +'s,  followed  by  a  sequence  of  -;s,  and  ending  with  a 
sequence  of  +:s  for  the  e  -'s.  Similarly,  there  is  a  sequence  of  -:s,  followed  bv  a  sequence 
of  +'s,  and  ending  with  a  sequence  of  -'s  for  the  e  -'s.  (The  sign  of  e  «  can  be  ignored  or 
changed  since  the  magnitude  of  e  fi  is  small.) 
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Figure  3.   Least  square  residuals  --sample  I. 
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Table  4a.     Linear  Regression  -  Sample  I 


»l 

Xi 

X(tj) 

exi 

*i 

?(V 

eyi 

di 

1 

5228.6 

5213.5 

+10.1 

-3465 . 1 

-3459.4 

-5.7 

11.6 

2 

5156.8 

5148.8 

+8.0 

-3407.0 

-3400.7 

-6.3 

10.2 

3 

5085.1 

5079.0 

+6.1 

-3346.2 

-3342.0 

-4.2 

7.4 

4 

5018.3 

5009.2 

+9.1 

-3285.1 

-3233.2 

-1.9 

9.3 

5 

4944.2 

4939.4 

+4.8 

-3222.2 

-3224.5 

+  2.3 

5.3 

6 

4878.1 

4869.7 

+  8.4 

-3165.6 

-3165.8 

+  0.2 

3.4 

7 

4800.0 

4799.9 

+0.1 

-3105.3 

-3107.1 

+  1.3 

1.3 

3 

4371.7 

4730.1 

+1.6 

-3049.7 

-3043.4 

-1.3 

2.1 

9 

4652.2 

4660.3 

-8.1 

-2987.2 

-2989.6 

+2.4 

8.5 

10 

4583.6 

4590.6 

-7.0 

-2930.8 

-2930.9 

+0.1 

7.0 

11 

4510.7 

4520.8 

-10.1 

-2870.5 

-2872.2 

+1.7 

10.2 

12 

4440 . 2 

4451.0 

-10.8 

-2810.5 

-2813.5 

+3.0 

11.2 

13 

4367.0 

4381.2 

-14.2 

-2750.0 

-2754.8 

+4.8 

15.0 

14 

4297.0 

4311.4 

-14.4 

-2690.3 

-2696.0 

+5.7 

15.5 

15 

4226.1 

4241.7 

-15.6 

-2626.7 

-2637.3 

+9.6 

18.3 

16 

4153.6 

4171.9 

-13.3 

-2571.6 

-2573.6 

+7.0 

19.6 

17 

4084.0 

4102.1 

-13.1 

-2511.6 

-2519.9 

+8.3 

19.9 

13 

4017.7 

4032.3 

-14.5 

-2449.1 

-2461.2 

+  12.1 

19.0 

19 

3968.6 

3962.5 

-6.1 

-2404.8 

-2402.4 

-2.4 

6.5 

20 

3924.6 

3892.8 

+31.8 

-2357.1 

-2343.7 

-13.4 

34.5 

21 

3868.0 

3823.0 

+45.0 

-2309.1 

-2285.0 

-24.1 

51.1 
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Table  4b.     Quadratic  Regression  -  Sample  I 


h 

xi 

£(t.) 

exi 

y* 

9(t{) 

eyi 

di 

i 

5228.6 

5241.3 

-12.7 

-3465.1 

-3469.8 

+4.7 

13.5 

2 

5156.8 

5164.7 

-7.9 

-3467.0 

-3408.0 

+  1.0 

8.0 

3 

5085.1 

5088.8 

-3.7 

-3346.2 

-3346.4 

+0.2 

3.7 

4 

5018.3 

5013.6 

+4.7 

-3285.1 

-3285.3 

+0.2 

4.7 

5 

4944.2 

4939.2 

+5.0 

-3222.2 

-3224.4 

+2.2 

5.5 

6 

4878.1 

4865.5 

+12.6 

-3185.6 

-3153.9 

-1.7 

12.7 

7 

4800.0 

4792.5 

+7.5 

-3105.8 

-3103.7 

-2.1 

7.8 

8 

4731.7 

4720.2 

+11.5 

-3049.7 

-3043.8 

-5.8 

12.9 

9 

4652.2 

4648.6 

+3.6 

-2987.2 

-2984.3 

-2.9 

4.6 

10 

4583.6 

4577.8 

+5.8 

-2930.8 

-2925.1 

-5.7 

8.1 

11 

4510.7 

4507.6 

+  3.1 

-3870.5 

-2866.2 

-4.3 

5.3 

12 

4440 . 2 

4438.2 

+2.0 

-2810.8 

-2807.6 

-3.2 

3.8 

13 

4367.0 

4369.5 

-2.5 

-2750.0 

-2749.4 

-0.6 

2.6 

14 

4297.0 

4301.5 

-4.5 

-2690.0 

-2691.5 

+1.5 

4.7 

15 

4226.1 

4234.2 

-8.1 

-2626.7 

-2633.9 

+7.2 

10.8 

16 

4153.6 

4167.7 

-14.1 

-2571.6 

-2578.7 

+5.1 

15.0 

17 

4084.0 

4101.9 

-17.9 

-2511.6 

-2519.7 

+8.1 

19.7 

18 

4017.7 

4036.7 

-19.0 

-2449.1 

-2463.2 

+14.1 

23.7 

19 

3968.6 

3972.3 

-3.7 

-2404.8 

-2406.9 

+2.1 

4.3 

20 

3924.6 

3908.7 

+15.9 

-2357.1 

-2351.0 

-6.1 

17.0 

21 

3868.0 

3845.7 

+22.3 

-2309.1 

-2295.4 

-13.7 

26.2 
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Linear 

S 
>ce 

S 
ve 

3.3 

2.0 

2.9 

1.9 

15.4 

9.5 

13.9 

11.1 

These  sign  sequences  would  ordinarily  indicate  that  the  next  higher  order 
polynomial,  a  quadratic,  should  do  well  in  reducing  the  residual  errors.  This  is  not 
substantiated;  however,  as  Table  4b  demonstrates.  The  deviations  in  this  table  have  four 
sequences  of  the  same  sign  and  suggest  that  even  a  cubic  polynomial  will  not  necessarily 
produce  an  excellent  fit  to  the  data— this  was  not  explored  further. 

An  alternative  to  using  higher  order  polynomials  is  the  reduction  in  sample 
size.  This  alternative  was  explored  for  the  sample  with  n=ll.  The  results  are  shown 
below: 


Quadratic 

S  S 

Sample  Points  >ce     '  ye  xe        ve 

814-824 
819-829 
824-834 
829-339 


The  three  basic  causes  for  residuals  are: 

(a)  maneuver  of  object  tracked  (this  is  represented  by  the  polynomial), 

(b)  noise  in  measurements,  (this  is  represented  by    <r      of  which  SQ  is 
an  estimate),  and 

(c)  outliers  (these  will  be  discussed  later  in  this  report). 

It  is  assumed  that  there  are  no  outliers  in  Sample  I.  Subsample  2  (points  BIS 
to  829)  appears  to  be  fitted  quite  well  by  a  straight  line  and  the  quadratic  was  applied  to 
give  an  estimate  of  the  size  of  <r  .  The  first  subsamples  (points  814  to  824)  are  fitted 
reasonably  well  by  a  straight  line  so  the  quadratic  was  not  tried.  The  last  two  subsamples 
have  substantially  larger  S  's.  This  could  be  caused  by  either  torpedo  maneuvers  or  a 
larger  noise  component  (larger      <r    )— this  was  not  explored. 
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C.  Data  Sample  II 

The  second  sample  selected  for  study  was  the  set  with  times  867  to  887. 

These  21  points  appear  to  present  a  curved  path  which  might  possibly  be   fitted  by  a 

quadratic.    First,  consider  the  successive  differences  in  Table  5.    Some  difficulty  similar 

to  an  outlier  is  indicated  in  the  vicinity  of  t-  =6   (t-  =  872).     Examination  of  the  first 

J  l  i 

successive  differences  shows  a  drop  in  velocity  between  te  and  tfi  and  only  partial 
recovery  between  tg  and  t~.  One  possible  explanation  would  be  an  additional  data  point 
between  tr  and  t~.  The  actual  explanation  is  the  inadvertent  introduction  of  a 
measurement  from  a  different  array  taken  at  time  t<-  and  entered  as  the  meaurement  at 
tfi.  Measurements  at  t„,  and  subsequent  times,  should  be  shifted  to  respective  preceding 
times. 

Instead  of  fitting  all  of  Sample  II,  eleven  points  (872-882)  were  selected 
somewhat  arbitrarily  for  fitting  by  least  squares— these  are  plotted  in  Figure  4.  The 
second  differences  all  have  the  same  sign  and  the  third  differences  are  small  and  have 
apparently  random  sign.  The  least  squares  straight  line  fit  is  presented  in  Table  6a  and 
sketched  in  Figure  4.  (Note  the  shift  in  the  time  scale).  This  was  introduced  to  reduce 
the  magnitudes  of  the  numbers  calculated  in  determining  the  fitted  line  and  S  .  In  dealing 
with  the  quadratic,  the  means  x  =  rj  2  x-  and  y  =  -y*  2  Vj  were  also  subtracted  from 
each  observation  x-  and  y.,  respectively,  for  the  same  reason.  Table  6b  presents  the 
quadratic  regression.  The  reduction  in  the  S  's  is  dramatic  as  would  be  expected  from 
Figure  4.  All  of  the  e-'s  are  less  than  5  and  hence  within  the  residual  noise  that  could  be 
expected  with  a  a  of  2  or  3.  The  signs  of  the  exi's;  however,  show  some  indications  of 
lack  of  randomness.  For  this  reason,  a  third-degree  polynomial  was  tried  for  the  x^'s  only. 
This  produced  the  value  S  =  0.946  with  the  maximum  magnitude  of  any  e  •  being  1.2. 
The  cubic  fits  the  data  very  well  indeed. 

D.  Data  Sample  HI 

The  third  sample  selected  for  study  involved  an  S-shaped  maneuver  as 
indicated  by  the  21  points  (848-868)  shown  in  Figure  5.  The  x  and  y  coordinates  of  these 
points  are  presented  in  Figure  6  where  it  is  evident  that  first  and  second  order  polynomials 
will  not  provide  acceptable  fits  to  the  data.  A  third-order  polynomial  appears  possible  for 
the  y.'s  and  a  fourth  order  for  the  x.'s.  A  subset  of  11  points  (851-861  or  points  4-14  in 
Figure  6  and  Table  7)  will  be  used  for  illustration. 
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Table  5.      Successive  Differences  -  Sample  II 


li 

xi 

Al 

A2 

^3 

*i 

At 

A2 

~3 

1 

2012.0 

+  18.0 

-1255.5 

+94.2 

2 

2030.0 

+  26.1 

+8.1 

+0.7 

-1161.3 

+91.1 

-3.1 

+0.1 

3 

2056.1 

+34.9 

+  8.8 

-0.4 

-1070.2 

+88.1 

-3.0 

+1.9 

4 

2091.0 

+43.2 

+8.3 

-43.4 

-982.1 

+87.0 

-1.1 

-106.5 

5 

2134.2 

+8.1 

-35 . 1 

+70.9 

-895.1 

-20.6 

-107.6 

+227  .9 

6 

2142.3 

+40.9 

+32.8 

-15.1 

-915.7 

+99.7 

+120.3 

-142.3 

7 

2183.2 

+58.6 

+17.7 

-12.6 

-816.0 

+77.7 

-22.0 

+  12.3 

8 

2241.8 

+63.7 

+  5.1 

+2.8 

-738.3 

-68 . 5 

-9.2 

-6.3 

9 

2305.5 

+71.6 

+7.9 

-5.0 

-669.3 

+65.6 

-2.9 

-7.1 

10 

2377.1 

+74.5 

+2.9 

+4.8 

-604.2 

+55 . 6 

-10.0 

+  5.2 

11 

2451.6 

+32.2 

+7.7 

-4.1 

-548.6 

+50.3 

-4.3 

-2.8 

12 

2533.8 

+  85.3 

+3.6 

-1.3 

-497.8 

+43.2 

-7.6 

-0.8 

13 

2619.6 

+88.1 

+2.3 

+1.5 

-454.6 

+34.3 

-8.4 

-0.4 

14 

2707.7 

+91.9 

+3.3 

-3.7 

-419.8 

+26.8 

-3.0 

-0.4 

15 

2799.6 

+92.0 

+0.1 

+3.6 

-393.0 

+  18.4 

-8.4 

-2.2 

16 

2891.6 

+  95.7 

-3.7 

-3 . 5 

-374.6 

+7.3 

-10.6 

+3.5 

17 

2987.3 

+  95.9 

+0.2 

-1.8 

-366.8 

+0.7 

-7.1 

-4.0 

18 

3083.2 

+  94.3 

-1.6 

+5.9 

-366.1 

-10.4 

-11.1 

-1.7 

19 

3177.5 

+98.7 

+4.3 

-9.2 

-376.5 

-23.2 

-12.3 

-9.5 

20 

3276.2 

+93.8 

-4.9 

-399.7 

-26.5 

-3.3 

21 

3370.0 

-426.2 
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Table  6a.      Linear  Regression  -  11   Points   (372-882 


t. 

1 

xi 

^(t.) 

exi 

-  i 

y(t-) 

e  • 

V! 

-5 

2183.2 

2148.5 

^•34. 7 

-816.0 

-762.0 

-54.0 

64.2 

-4 

2241.8 

2229.7 

+12.1 

-738.3 

-716.5 

-21.8 

24.9 

-3 

2305.5 

2310.9 

-5.4 

-669.8 

-671.1 

+  1.3 

5.6 

-2 

2377.1 

2392.1 

-15.0 

-604.2 

-625.7 

+21.5 

26.2 

-1 

2451.6 

2473.2 

-21.6 

-548.6 

-588.2 

-31.6 

38.3 

0 

25 oo .  o 

2554.4 

-20.6 

-497.8 

-534.3 

+37.0 

42.4 

1 

2619.6 

2635.6 

-15.0 

-454.6 

-489.4 

-34.8 

38.3 

2 

2707.7 

2716.8 

-9.1 

-419.8 

-443 . 9 

-24.1 

25.7 

3 

2799.6 

2798.0 

+1.6 

-393.0 

-398.5 

+  5.5 

5.7 

4 

2891.6 

2879.2 

+12.4 

-374.6 

-353.1 

-21.5 

24.3 

5 

2987.3 

2960.4 

+26.9 

-366.1 

-307.6 

-58.5 

64.4 

/^    ,,\  «-,-    .         M    .«-,         -.«-.  /7> 


X  (t)=  2554 . 4+81 .  19t  '?(t)=  -534 . 8+45  .  43t 

Sxe  =  20.33  Sve  =  36.41 
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Table  6b.      Quadratic   Regression  -  11   Points   (872-382) 


h 

Xi 

^t.) 

exi 

*i 

yCV 

eyi 

di 

-5 

2183.2 

2179.3 

+3.9 

-816.0 

-817.8 

+1.8 

4.3 

-4 

2241.8 

2241.8 

-0.2 

-738.3 

-738.9 

+0.6 

0.6 

-3 

2305.5 

2308.8 

-3.3 

-669.8 

-667.4 

-2.4 

4.1 

-2 

2377.1 

2379.7 

-2.6 

-604.2 

-603.3 

-0.9 

2.8 

-1 

2451.6 

2454.7 

-3,1 

-548.6 

-546.7 

-1.9 

3.6 

0 

2533.8 

2533.9 

-0.1 

-497.8 

-497.6 

-0.2 

0.2 

1 

2619.6 

2617.1 

+2.5 

-454.6 

-455.9 

+1.3 

2.8 

2 

2707.7 

2704.5 

+3.2 

-419.8 

-421.6 

+1.8 

3.7 

3 

2799.6 

2796.0 

+3.6 

-393.0 

-394.8 

+1.8 

4.0 

4 

2891.6 

2891.6 

0.0 

-374.6 

-375.5 

-0.8 

0.8 

5 

2987.3 

2991.3 

-4.0 

-366.1 

-363.5 

-2.6 

4.8 

"xit)  =  2533.9+81. 19t+2.057t2  ^(t)  = -497 .6+45  .43t-3.724t2 

Sxe=3.32  S     =1.91 
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The  results  of  fitting  third-degree  polynomials  to  these  11  points  is 
presented  in  Table  8  and  the  fourth-degree  polynomial  in  Table  9.  The  cubic  equation  fits 
the  y  component  quite  well,  but  even  the  quartic  equation  leaves  something  to  be  desired 
(smaller  S  )  for  the  x  component.  Higher  order  polynomials  were  not  tried.  The  estimates 
Sg  for  <r  obtained  by  fitting  polynomials  to  the  subsample  of  11  points  are  presented 
below: 


Order  of 

Polynomial 

X 

Y 

1 

66.8 

94.5 

2 

37.3 

42.6 

3 

34.0 

3.5 

4 

9.3 

Improvement  in  fitting  the  y  component  by  increasing  the  order  of  the 
polynomial  is  quite  dramatic  but  the  improvement  is  considerably  slower  for  the  x 
component.  The  third-order  polynomial  could  be  considered  acceotable  for  y  but  a  fifth- 
order  polynomial  should  be  tried  for  x.  The  order  of  polynomial  used  does  not  have  to  be 
the  same  for  both  components. 

E.  Discussion 

Only  one  in-water  run  was  examined  and,  for  it,  only  selected  sections  of  the 
torpedo  path  were  treated  in  any  detail.  Nevertheless  some  conclusions  can  be  made 
about  application  of  the  Sequential  Differences  and  Least  Squares  Regression  techniques 
to  3-D  data. 

(1)    Sequential  differences: 

(a)  These  differences  provide  some  capability  for  locating  isolated 
outlier  points  which  differ  substantially  from  the  path  of  the  object  being  tracked.  This 
was  illustrated  in  Sample  II.  The  model  shown  in  Tables  1  and  2  needs  extension  to  higher 
order  polynomial  paths  and  multiple  outliers.  Also,  the  critical  magnitudes  for  sequential 
differences  (refer  to  Table  1)  must  be  increased  to  allow  for  accelerations  since  the  use  of 
sequential  differences  will  precede  fitting  a  polynomial  and  hence  the  order  of  the  fitted 
polynomial  will  not  be  known  at  the  time.  Thus  sequential  differences  should  be  used  only 
for  a  first  screening  for  gross  outliers. 
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Figure    5.      Data    sample    III    --points    343-363. 
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Table  7.     Successive  Differences  -  Sample  III 


li 

X- 

l 

*1 

^2 

A3 

*l 

Ai 

A2 

^3 

1 

2949.3 

-40.5 

-1364.0 

+74.4 

2 

2889.3 

-56.8 

-10.8 

+10.1 

-1289.6 

+74.0 

-0.4 

-17.9 

3 

2828.5 

-51.5 

-0.7 

-22.3 

-1215.6 

+56.5 

-17.5 

+49.8 

4 

2777.0 

-74.5 

-23.0 

+12.8 

-1159.1 

+88.8 

+32.3 

-83.6 

5 

2702.5 

-84.7 

-10.2 

+  1.4 

-1070.3 

+37.5 

-51.3 

+14.8 

6 

2617.8 

-93.5 

-8.8 

+18.0 

-1032.8 

+  1.0 

-36.5 

-8.5 

7 

2524.3 

-84.3 

+9.2 

+20.3 

-1031.8 

-44.0 

-45.0 

+16.6 

8 

2440.0 

-54 . 3 

+30.0 

+8.4 

-1075.8 

-72.4 

-28.4 

+9.1 

9 

2385.7 

-15.9 

+38.4 

+  3.2 

-1148.2 

-91.7 

-19.3 

+32.1 

10 

2369.8 

+25.7 

+41.6 

-56.7 

-1239.9 

-78.9 

+12.8 

-25.3 

11 

2395.5 

+  10.6 

-15.1 

-23.5 

-1328.8 

-91.4 

-12.5 

+15.4 

12 

2406.1 

-33.0 

-43.6 

+8.8 

-1420.2 

-88 . 5 

+2.9 

+21.2 

13 

2373.1 

-67.8 

-34.3 

+13.4 

-1508.7 

-64.4 

+24.1 

+15.3 

14 

2305.3 

-89.2 

-21.4 

+19.2 

-1573.1 

-25.0 

+39.4 

+2.4 

15 

2216.1 

-91.4 

-2.2 

+17.9 

-1598.1 

+16.8 

+41.3 

-4.9 

16 

2124.7 

-75.7 

+15.7 

+17.8 

-1581.3 

+53.7 

+36.9 

-7.0 

17 

2049.0 

-42.2 

+33.5 

+4.2 

-1527.6 

+83.6 

+29.9 

-20.3 

18 

2006.8 

-4.5 

+37.7 

-23.5 

-1440.0 

+93.2 

+9,6 

-7 . 5 

19 

2002.3 

+9.7 

+14.2 

-5.9 

-1350.8 

+95.3 

+  2.1 

-3.2 

20 

2012.0 

+18.0 

+8.3 

-1255.5 

+94.2 

-1.1 

21 

2030.0 

-1161.3 
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Table  8.      Cubic  Regression  -  Sample  III   (11  points) 


li 

xi 

^(tj) 

exi 

yi 

fttj) 

eyi 

di 

-5 

2777.0 

2804.8 

-27.8 

-1059.1 

-1159.0 

-0.1 

27.8 

-4 

2702.5 

2680.2 

+22.3 

-1070.3 

-1066.7 

-3.6 

22.6 

-3 

2617.8 

2383.9 

+33.9 

-1032.8 

-1023.6 

-4.2 

34.2 

-2 

2524.3 

2511.8 

+12.5 

-1031.8 

-1035.5 

+3.7 

13.0 

-1 

2440.0 

2459.6 

-19.6 

-1075.8 

-1078.3 

+2 . 5 

19.3 

0 

2385.7 

2423.3 

-37.6 

-1148.2 

-1147.7 

-0.5 

37.6 

1 

2369.8 

2398.6 

-28.8 

-1239.9 

-1234.7 

-5.2 

29.3 

2 

2395.5 

2381.3 

+  14.2 

-1328.8 

-1330.0 

+1.2 

14.3 

3 

2406.1 

2637.6 

+  38.5 

-1420.2 

-1424.5 

-4 . 3 

38.7 

4 

2373.1 

2252.8 

+  20.3 

-1508.7 

-1509.1 

+0.4 

20.3 

5 

2305.3 

2333.1 

-27.8 

-1573.1 

-1574.5 

+1.4 

27.8 

1c(t)=2423.3-29.812t+5.827t2-.649308t3 

/y(t)=-1147.7-79.73t-8.761t2+1.5271t3 

S     =34.0   S,    =3.5 
xe  ye 
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Table  9.     Quartic  Regression  -  Sample  III  (11  points) 


t;  X;  l^t.) 


e  • 

1  1  1  XI 


■5  2777.0  2774.0  +3.0 

-4  2702.5  2711.0  -8.5 

■3  2617.8  2614.8  +3.0 

•2  2524.3  2516.9  +7.4 

-1  2440.0  2439.1  +0.9 

0  2385.7  2392.5  -6.8 

1  2369.8  2378.1  -8.3 

2  2395.5  2386.6  +8.9 

3  2406.1  2398.4  +7.7 

4  2373.2  2383.7  -10.6 

5  2305.3  2302.3  +3.0 


y\ 


x(  t)=2392. 4-29.  812t+16. 533t"-.6943t  -.  4282341' 


Sxe=9'3 
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(b)  Sequential  differences  also  provide  some  indication  of  the  order  of 
polynomial  that  will  be  required.  One  indicator  is  the  number  of  sign  changes  that  occur 
on  the  successive  differences  of  a  particular  order.  If  there  are  few  sign  changes,  then  a 
non-random  effect  is  indicated  and  a  higher  order  polynomial  will  be  indicated.  Thus,  for 
example,  in  Sample  II  the  11-point  data  subset  shows  a  long  sequence  of  +'s  for  the  A  r's, 
but  no  such  sequence  (indicating  randomness)  for  the  A  ->-'s.  Hence,  a  third  order 
polynomial  can  be  expectea  to  provide  some  improvement  over  a  second-order 
polynomial.  This  type  of  information  may  be  difficult  to  incoroorate  into  a  data 
smoothing  algorithm,  but  even  some  simple  procedure  can  be  of  help  in  reducing  the 
computational  load. 

(2)  Sample  Size: 

(a)  Although  it  is  possible  that  a  sample  of  21  points  could  be  fitted 
with  acceptably  small  S  in  some  instances  (the  quadratic  was  not  tried  on  Sample  II),  it 
would  appear  that  smaller  samples  (e.g.,  n=ll)  will  allow  fitting  the  data  with  a 
reasonably  low-order  polynomial.  The  size  n=ll  is  not  sacrosanct  but  will  leave  some 
room  for  elimination  of  outliers  and  so  seems  to  be  a  reasonable  size. 

(3)  Least  squares  smoothing: 

(a)  By  its  nature,  the  estimate  SQ,  for  the  standard  deviation  <s  of 
the  measurement  noise,  is  monotone  decreasing  as  the  order  of  the  polynomial  increases. 
(An  n-1  order  polynomial  should  be  able  to  fit  n  points  exactly  so  that  Se  would  be  zero.) 
The  appropriate  order  polynomial  is  one  which  reduces  S  to  the  level  of  the  noise  in  the 
measurements.  This  may  vary  with  the  path  and  the  array  making  the  measurements.  For 
the  portions  of  the  path  examined,  it  is  suspected  that  <rv  is  less  than  ax  since  Sye  is 
generally  smaller  than  S  for  a  given  order  polynomial.  The  decision  to  use  a  higher- 
order  polynomial  to  fit  a  set  of  data  depends  upon  the  value  of  Se  obtained  for  a  given- 
order  polynomial.  If  Se  is  small  (3  or  4),  then  higher-order  polynomials  cannot  be  expected 
to  give  much  improvement.  The  extent  to  which  S  can  be  reduced  will  depend  upon  the 
component  as  well  as  the  polynomial  degree. 


3-21 


(4)  Outliers: 

(a)  In  addition  to  rough  screening  for  outliers  by  sequential 
differences,  there  is  additional  screening  that  can  be  performed  using  residual  errors  after 
a  polynomial  has  been  fitted  to  the  data.  Outliers  contributed  substantially  to  S  and  the 
two  basic  techniques  of  reducing  S  are  elimination  of  points  with  large  residuals,  or 
increasing  the  order  of  the  polynomial. 

(b)  Elimination  of  outliers  using  residuals  after  smoothing  can  be 
accomplished  in  two  ways: 

(1)  by  confidence  intervals— a  residual  greater  in  magnitude  than 
some  specified  multiple  (3  or  larger)  of  S    can  be  considered  to  be  a  outliers,  and 

(2)  by    variance    reduction— the    ratio    of   SJs    before    and    after 

J  e 

removal  of  a  point,  or  points,  with  substantial  residuals  can  be  used  as  a  basis  for  the 
decision  on  whether  to  remove  the  points.  For  example,  if  S  (after)/S  (before)  ^  r, 
then  the  points  should  be  removed  (Grubbs'  criteria).  The  value  of  r  is  in  the  range  0.0  to 
1.0  and  could  be  changed  depending  upon  the  magnitude  of  S  . 

(5)  Sampling  rate: 

(a)  The  smoothing  of  3-D  data  can  be  performed  to  provide  either  a 
parametric  representation  of  path  segments,  or  specific  information  such  as  position  and 
velocity  information,  only  at  certain  points  on  the  path.  These  will  be  callled  "Darametric 
estimation"  and  "point  estimation,"  respectively. 

(b)  To  illustrate  parametric  estimation,  consider  data  collected  at 
200  sequential  observation  times  (e.g.,  800  to  1,000  for  the  3-D  data  used  in  this  section). 
Samples  of  11  points  will  be  used.  Sample  S,  will  consist  of  points  1  through  11,  sample  S9 
of  points  10  through  20  and,  in  general,  sample  S-  of  points  from  10(j-l)  to  10].  There  will 
then  be  20  samples  on  the  path.  Each  sample  of  11  points  is  to  be  fitted  by  a  polynomial 
of  appropriate  degree  and  the  parameters  of  the  polynomial  together  with  the  value  of  S 
recorded  for  the  path  segment  represented  by  that  sample.  Note  that  there  will  be  two 
points  of  overlap  between  S1  and  S„  and  one  point  of  overlap  thereafter. 
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(c)  For  point  estimation,  sequence  of  points  must  be  provided.  For 
data  consisting  of  200  points  it  may  be  considered  that  occasional  monitoring  is  sufficient 
for  points  0  to  50  and  100  to  150,  but  that  behavior  of  the  path  from  points  50  to  100 
should  be  monitored  more  often  and  behavior  from  points  150  to  200  should  be  followed 
closely.   Then  the  following  sequence  of  points  could  be  considered  reasonable: 


Points 

Midpoint 

j 

in  S- 

t. 

l 

1 

5-15 

10 

2 

25-35 

JO 

3 

45-55 

50 

4 

55-65 

80 

5 

65-75 

70 

6 

75-85 

80 

7 

85-95 

90 

8 

95-105 

100 

9 

115-125 

120 

10 

135-145 

140 

11 

145-155 

150 

12 

150-160 

155 

13 

155-165 

160 

14 

160-170 

165 

15 

165-175 

170 

16 

170-180 

175 

17 

175-185 

180 

18 

180-190 

185 

19 

185-195 

190 

20 

190-200 

195 
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(d)  At  each  midpoint  time  t.,  the  position  coordinate  estimates,  the 
velocity  in  these  components,  the  resultant  velocity,  and  S  •  can  be  recorded  together 
with  additional  information,  such  as  acceleration  components,  if  desired.  Note  that  the 
sequence  of  20  points  suggested  above  has  substantial  overlap  of  samples  in  some  cases 
and  data  gaps  between  samples  in  other  cases.  This  was  introduced  intentionally  since 
least  squares  smoothing  produces  better  estimates  (smaller  confidence  intervals)  at  the 
midpoint  of  the  sample  when  the  fitted  curve  is  a  straight  line  (refer  to  Appendix  B). 

(e)  Parametric  estimation  could  also  be  modified  to  delete  some 
samples  (e.g.,  alternate  samples  from  t  =100  to  t  =150).  It  should  require  greater 
modification  to  achieve  the  quality  of  point  estimation  procedure  at  other  than 
parametric  sample  midpoints  when  a  straight  line  (first-order  polynomial)  is  used.  When 
higher  order  polynomials  are  required,  the  preference  for  the  best  estimate  at  midpoint  of 
the  sample  is  lost  (refer  to  Appendix  B).  Making  both  techniques  available  provides  some 
flexibility  in  data  smoothing  to  accomodate  potential  customers. 
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IV.  A  DATA  SMOOTHING  ALGORITH?/! 

The  following  procedure  is  suggested  for  smoothing  3-D  data: 

Step  1:  Select  appropriate  sanriDle  size.    (11  is  suggested  as  being  small  enough  to 

provide  some  capability  of  fitting  path  segments  of  maneuvering  torpedoes  without 
requiring  high-order  polynomials.   Some  leeway  for  dropping  outliers  is  also  provided.) 

Step  2:  Select  parameter  of  point  estimation. 

Step  3:  Select  sampling  rate.    (A  standard  rate  such  as  described  in  Section  III  F4 

should  be  provided  as  a  default  rate  for  parameter  estimation  and  the  midpoints  of  these 
samples  as  a  default  rate  for  point  estimation.) 

Step  4:  Adjust    data    for    missing    data    points.       (The    principle    aoplied    here    is 

minimization  of  the  effect  of  the  numbers  on  sequential  differences.  For  a  single  missing 
datum,  the  average  of  the  values  at  two  adjacent  times  will  minimize  the  second 
differences.  In  any  case,  data  supplied  in  this  step  must  be  removed  before  least  squares 
smoothing  is  applied.) 

Step  5:  Calculate  first,  second,  and  third  order  sequential  differences. 

st 
Step  6:  Determine  approximate  polynomial  order  k.     (The  (k^-l)      order  sequential 

differences  should  contain  noise  only,  and  thus,  have  random  signs.     Sequences  of  4,  or 

more,  differences  with  the  same  sign  suggest  the  presence  of  a  non-random  component  as 

does  the  occurrence  of  4,  or  fewer,  changes  of  sign.     The  presence  of  a  non-random 

component  is  going  to  be  awkward  to  identify.    If  the  second  differences  are  random,  then 

k=l.    If  the  second-order  differences  are  non-random,  but  the  third-order  differences  are 

random,    then    k=2.      If   the   third-order   differences   are   non-random    then   fourth-order 

differences  should  be  calculated  and  examined   for  randomness.     (This   examination  of 

sequential  differences  in  increasing  order  should  probably  not  be  carried  beyond  the  f 

order.) 
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Step  7:  Screen    successive    differences    for    gross    outliers.       (This    must    follow 

determination  of  approximate  degree  of  polynomial  since  it  should  be  based  on  comparison 
of  magnitude  of  deviation  to  noise  only  as  indicated  in  Tables  1  and  2.  The  critical  values 
suggested  in  those  tables  should  be  increased  substantially.  Some  limit,  possibly  between 
50  and  100,  should  be  selected  keeping  in  mind  that  this  is  a  first  screening  for  gross 
outliers  and  a  second  screening  will  be  made.  Any  outliers  found  in  this  step;  however, 
will  reduce  computations  in  later  steps.  Remove  any  outliers  found  and  the  observations 
for  the  other  space  components  at  the  same  observation  time.) 

Step  8:  Check   for   polynomial  degree   compatibility.      (If   the   number   of  outliers 

removed  (r)  satisfies  the  inequality  r  +  k  ^  n-1,  where  k  is  the  degree  of  polynomial  found 
in  Step  8  and  n  is  the  sample  size  after  data  points  supplied  in  Step  4  are  removed,  then 
fitting  a  k  order  polynomial  will  be  inappropriate.  For  example,  if  r  =  4  points  are 
removed  from  a  sample  in  which  one  data  point  has  been  created  in  Step  4,  then  a 
polynomial  of  degree  5  can  be  fitted  to  the  data  without  any  residual  errors  since  there 
are  6  linear  relationships  of  the  6  coefficients.) 

Step  9:  Fit   a  polynomial  of  degree  k   to   the  data.     (The  least  squares  procedure 

outlined  in  ADpendix  A  is  applicable.   At  this  step  only  S      need  be  determined  and  not  the 

ke 
coefficients.) 


Step  10:  Seek  acceptable  Sp.     (If  S^p  is  unacceptably  large,  repeat  Step  9  with  k 

replaced   by   k  +  1.    Repeat   t! 
degree  5  is  fitted  to  the  data./ 


e  ke 

replaced   by   k  +  1.    Repeat   this   step  until  either  S     is   acceotable  or  a  polynomial  of 


-o' 


Step  11;  Complete  least  squares  polynomial  fit.    (The  coefficients  for  the  polynomial 

of  degree  found  in  Step  10  are  now  needed,  and  the  residual  errors.) 

Step  12;  Second  screening  for  outliers.   (One  of  the  procedures  discussed  in  Section  III 

E3  should  be  applied  to  locate  any  outliers  not  found  in  Step  7.   Remove  the  outliers). 

Step  13:  Repeat   Steps  9,    10,    11,    and    12   until  no   more   outliers   are   found.     (The 

polynomial  obtained  will  be  used  for  smoothing  sample  data.  Note  that  the  alternative 
procedure  of  searching  residuals  for  each  polynomial  degree  to  locate  outliers  may  result 
in  removing  points  which  are  not  actually  outliers  but  legitimate  observations  for  a  higher 
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degree  polynomial.  On  the  other  hand,  the  proposed  method  could  use  a  higher  order 
polynomial  to  fit  outliers  when  a  lower  order  polynomial  should  actually  be  used.  There  is 
a  choice  of  the  type  of  misfit  that  is  acceptable.) 

Step  14:  Record    smoothed    path.       (For    parametric    form,    if    specified    in    SteD  ?, 

recorded  data  includes  coefficients  of  fitted  polynomial,  S  ■  and  n-  for  each  sample  S- 

specified  in  Step  3.     For  point  estimation   form,   if  specified  in  Step  2,   recorded  data 

includes:      time    t.    estimated   coordinates   s.  =  x(t.),      y.  =  y(t.),    and  2-  =  z(t.),   velocity 

components,  S    .,  and  n   for  each  point  specified  in  Step  3.    Additional  path  information 

e]  J 

may  also  be  specified;  e.g.,  acceleration  components.) 
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V.  CONCLUSIONS  AND  RECOMMENDATIONS 

The  procedure  suggested  in  Section  IV  provides  a  reasonable  aDproach  for 
obtaining  the  information  desired  in  parts  (1),  (2),  and  (3)  of  Section  I  B.  No  attempt  has 
been  made  to  provide  the  information  in  part  (4). 

In  instrumenting  this  procedure,  several  parameters  must  be  provided: 

A.  Sample  Size  (Step  1) 

A  smaller  sample  size  of  n=7  has  been  suggested.  This  would  permit  fitting 
path  segments  contained  maneuvers  with  lower  order  polynomials,  but  is  subject  to 
greater  degredation  by  missing  data  points  and  removal  of  outliers.  Experience  on 
relative  occurrence  of  such  events  in  actual  field  data  will  be  useful  in  selecting 
appropriate  sample  size. 

B.  Choice  of  Parameter  or  Point  Estimation  (Step  2)  and  Sampling  Rate  fSteo  3) 

The  desires  of  the  customers  who  will  use  the  smoothed  data  is  of  primary 
concern  here. 

C.  Specifying  Approximate  Polynomial  Order  (Step  6) 

it. 

It  will  be  difficult  to  specify  a  simple  rule  for  determining  that  the  k      order 

st 
sequential  differences  contain  non-random  components  but  the  (k+1)      order  differences 

involve  only  random  components.   The  Theory  of  Runs  can  be  of  some  help  here  although  a 

simpler  rule  is  desirable— this  needs  further  study. 

D.  Rough  Screening  For  Outliers  (Step  7) 

A  reasonable  critical  level  for  identifying  outliers  by  sequential  differences 
must  be  established.  The  occurrence  of  an  isolated  outier  was  considered  in  Section  II  3. 
Other  potential  producers  of  large  sequential  differences  such  as  paired  outliers,  violent 
changes  in  velocity,  et  cetera,  should  be  examined  for  resultant  effects.  Identification  of 
signatures  for  such  effects  will  be  useful  in  using  sequential  differences  to  identify 
outliers. 
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E.  Polynomial  Degree  Limitations  (Step  10). 

The  limitation  of  polynomial  degree  to  5,  or  less,  appears  reasonable  for 
samples  of  size  11.  The  possibility  of  decreasing  this  limit  to  4  or  increasing  it  to  6  or 
higher  should  be  considered.  This  may  require  more  experience  with  in-water  run  data. 
For  smaller  sample  sizes,  such  as  n=7,  reduction  of  this  limit  to  lower  polynomial  degree 
should  be  considered. 

F.  Computing  Smoothed  Path  (Step  11) 

The  pivotal  condensation  method  outlined  in  Appendix  A  can  be  simplified 
even  further  in  certain  cases  which  may  occur  frequently  enough  to  take  advantage  of 
their  commonality  in  the  computer  program.  In  particular,  when  the  sample  consists  of 
n=ll  data  points  at  adjacent  times,  the  shift  of  the  time  origin  to  the  midpoint  of  the 
sample  produces  the  following  effects: 

(1)  coefficients  of  the  polynomial  parameters  are  the  same  in  the  normal 
equations  for  all  samples, 

(2)  only  the  last  column  in  the  pivotal  condensation  format  changes  with 
sample,  and 

(3)  the  other  columns  in  the  pivotal  condensation  format  require  only 
addition  of  a  row  and  a  column  in  each  box  when  the  next  higher  degree  polynomial  is 
considered. 

The  above  commonality  is  also  clearly  evident  in  the  vector  representation  presented  in 
Appendix  A.  The  extent  to  which  this  commonality  can  be  exploited  depends  primarily 
upon  the  rarity  of  missing  data  points  and  outliers.  Indeed,  depending  upon  requirements 
of  the  ultimate  users,  data  smoothing  could  conceivably  be  restricted  to  onlv  such 
samples. 

In  summary,  the  data  smoothing  algorithm  presented  in  Section  IV  appears 
reasonable,  but  there  are  several  elements  that  must  be  specified  before  it  can  be 
implemented.  Some  of  these  can  be  improved  by  further  research,  others  depend  upon  the 
quality  of  the  data  which  can  only  be  determined  by  experience  with  actual  3-D  data. 
Finally,  some  of  them  can  only  be  determined  in  consultation  with  the  ultimate  users  of 
the  smoothed  data. 
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APPENDIX  A 

LEAST         SQUARES         DATA         SMOOTHING 

A-l  LINEAR    LEAST    SQUARES   WITH    ONE    PREDICTOR 

Sample: 

(xi    y^    i=l,2,  . ..  ,n 

Assumptions : 

Al  —  Actual  relationship  between  X  and  Y 
is  linear ,  i.e., 

y  (x)  =  1+  Sx 
A2  —    Abscissas  are  without  errors 

xt-;1 

A3  —  Ordinates  contain  measurement  or 
observations/errors 

y  .  =y  .  +  €. 

€.  =  observational  error 

y^y  (xi) 

Problems: 

Fit  a  straight  line  to  the  data 
Engineer's  Solution: 

Let      y(x)=a+bx 

ei=yi-y  (x^ 

n    *   n       ■       i 
D=£  e .  =  v  (y . -a-bx . ) 
l      i    r  w  i     i 

The  coefficients  a  and  b  are  selected  to 
minimize  D  (the  sum  of  squares  of  the  deviations  of  the 
observed  y-   's  from  the  fitted  line) .   Setting 

3a       3  b 


A-l 


gives  the  two  equations 

na+ (Zx- ) b=Zy • 

(Zx. ) a+ (Zx.  ) b  =  Zx. y . 
1    v   1  '     r  1 

Solving  these  equations  yields  the  desired 
estimates  a  and  b  for  the  parameters  a   and  B,  i.e., 

n  (sx  y  )  -  (Sx  )  (vy  ) 

b  =  VJ "i L 

n(Zx  j  )-(Zxj) 

a  =  (syt)  -b(sx.) 

n 

Computational  Format: 

The  following  format  uses  pivotal 
condensation  to  produce  a  and  b.  It  also  yields  D  and  hence 
the  sample  variance 

9       1      n      7 

c;2=— —  £  p2 
be  n-2  i    ei 

without  requiring  calculation  of  the  individual  e.'s. 

n      (Zx±)  (ZYi)      Axx=[n(Zxi2)-(Zxi)2]/n 

(Zx.2)    (Zx.y.)    Ax7=[n(Zxiyi)-(Zxi)  (Zy^  ] /n 
(ZYi2)     Ayy  =  [n(Zyi2)-(Zyi)2]/n 


Axx  Axy  D=   'Axx  Axy~Ayy  ^-\lx 

A^  S2=D/(n-2) 

D  b=Axy/Axx 

a  b  S^  a=[ (Zyi)-b(Zxi) ]/n 

Statistician's  Solution; 

Statisticians  augment  the  Engineer's  Solution 
by  adding  the  following  assumptions: 

A4  —  The  observational  errors  (the  £.'s) 
are  realizations  of  independent  random  variables,  E.'s,  with 
zero  means  and  common  variance,  i.e., 

a_2  =£  (E,-u_  ^2]=  a2  for  all  i. 


A- 


A5  --  The  observational  errors  are 
normally  distributed  random  variables.  This  will  be  denoted 
by 

E.  ~  N(0,a2) 

Now  let  y.  denote  the  realization  of  the 
random  variable  Y.  .   Then 

Y.=y (X. )+E. 
1  J   1'   1 

and 

Further,  the  random  variable  Y.  can  be  expressed  in  the  form 

Yi=A+BXi 

where 

nSx.Y.  -  (Sx.  )  (SY. )   , 
B= LJ 1 l_  \y      and 

n(2x*  )-(Zxj)2  Axx 
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Note  that  A  and  B  are  linear  functions  of  the  Y-'s  and  hence  of 
the  E.'s.   It  can  now  be  shown  that 

u,=£(A)  «a 
A  and 

UB=C(B)=8 

so  that  a  and  b  are  unbiased  estimators  for  ct  and  3'.  The 
evaluation  of  the  variances  of  Y  ( x)  ,  A  and  B  is  simplified  if 
the  x.'s  are  shifted  so  that  their  mean  is  zero.   Then,  since 

Zx^O 
Axx=Exi2 
Axy=  EXiYj 


b=(Zx]yi) 


(rxj) 


a  =  L  A  =  Y 
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This  shift  in  the  x-axis  will  be  assumed  in  the  development 
which  follows. 

It  can  now  be  demonstrated  that 


a^=aVn, 
a'  =  a2/(2xi  z)  , 
Cov  (A,B)  =i[  (A-a)  (B-/?)  ]  =0  , 

2 


a 


y(x) 


n    (Ex^jJ 


a2, 


and 


g(^ 


2 

E 


The  last  relationship  is  very  important  since  s|  is  an 
unbiased  estimator  ofa2and  is  our  only  source  of  information 
on  this  parameter. 

The  assumption  of  normality  (A5)  together  with 
linearity  of  the  other  random  variables  in  the  E.'s  insures 
that  they  are  also  normally  distributed,  i.e., 

Y  -  N(y,a2)  , 

A  -  N(a,a2/n) , 

B    -   N(/?,a2/Zx[  2)  ,  and 

Y(x)     -   N[Y(x)  ,     (i   +  .*2      )a2] 

r '         L  X  •  z 

The  random  variable  (n-2)  Sg/J  has  a  Chi-Square  distribution 
with  n-2  degrees  of  freedom  and  the  random  variables. 


Ta   =  /n~(A-QQ 


Se 


=  B  -I 


S-q/Sxj  2 


and 


Ty  [x] 


(Y(x)-y(x)  ) 
S, 


e'n  +  2Xi 


have  a  Student-T  distribution  with  n-2  degrees  of  freedom. 

These  distributions  can  then  be  used  to 
establish  confidence  intervals  for  a  ,  8,  and  y  (x)  at  any  x. 
Thus,  for  example,  with  k  from  Student-T  tables  such  that 

P(-k<T<+k)  =.9  5 


A-A 


we  have  the  following  95%  confidence  intervals 

(a-kSe,  a+kS^) 
/n~    /n 

for  a, 


(b-kSp  /Sx:2,  b+kS.  /Zx. 2) 


for  B,  and 


(Y(x)-kSe  4+y^r,  Y(x)+kSe  4+-f^) 


for  y  (x)  at  any  x.  It  should  be  stressed  that  the  confidence 
interval  for  y  (x)  given  above  involves  measurements  about  the 
mean  of  x  (x=0)  .  The  general  form  for  this  confidence 
interval  is 


<*<*>-«.  A  +  -rgf^r  -  ?<*>+*se  4  +  -jpS- 


It  should  be  noted  that  the  confidence  interval  for  y  (x)  is 
shortest  for  x=x  and  increases  as  x  deviates  from  this  value. 

A  sketch  of  the  situation  can  help  clarify  the 
mathematical  elements  involved. 
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y  |x[  =  ot+Bx  =  actual  linear  relationship 

y  \xi    =  a+bx  =  fitted 

€.  =  observational  error 

e.  =  fitting  error 

e(x)  =  prediction  error  at  any  x 


x 
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A-2.       LINEAR  LEAST  SQUARES  WITH  TWO  PREDICTORS 

Sample: 

( x  .   /   x  .   ,  y    )       i  =    ,  2  ,  .  .  .  ,  n 

11  21  1 

Assumptions : 

Al    —  y(x)=a0+aixi+a2x2       x-fx^x^ 

A2    --  ^ir^a     and    xzi  =  x;i 

A3    --  yi=Y(x1)+€.  x.  =  (xljL,x2i) 

Engineers'    Solution: 

Let  y (x) =a   +  a   x   +a   x 

ei=Yi-y(xi) 
D=Ze..  2  =  Z(y   -a    -a    x    .  -a    x    .  )  2 

1  W  0  111  2      21 

Minimizing   D 

1D_=0,     3D_=0,    1D_=0 
3a  3a  3a 

0  1  2 

produces    the    normal    equations 

nao+(2xii)a1   +    (Exzi)a2   =    (2yt)  (1) 

(2xn)ao    +    (2xii2>ai    +    (2xiiy2i>a2    =    (^ilYi)  ^2) 

(SX2i)a0     +      (ZXli       21^1      +      <ZX2i2>a2     ■       ^i^  ™ 

which  can  be  solved  for  a  ,  a  ,  and  a,  in  terms  of  sample  data. 

0  1'  2 

Solving    (1)     for    a      gives 

a  Q=  [  (  Zy  t )  -  (  Zx  d  )  a  i  -  (  Sx  2i  )  a  2]  /n  ( l') 

Substituting    (1    )     in    (2)    and    (3)    gives 

Ai  iai+   Ai2a2=Ai  y 
A  ,    a    +A,    a    =A , 

12      1  2  2      2  2  y 
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Solving  (2')  for  al   gives 

and  substituting  in  (3*  )  gives 

D  7  •> 


*2 

i.  o2  2 


where  the  coefficients  will  be  defined  in  the  computational 
format  which  follows.  Equations  (3"  ),  (2"  )  and  (l1  )  can  be 
used  to  determine  the  values  of  a  .  a.  and  a, 

0       1  2- 

COMPUTATIONAL  FORMAT 


2xa       Zx2i  2y.  A^=[n(EXjixki)-(Zx.  .)  (Exkl)]/ 

Sx21;L    ZXllx2i      Zx22yi  A.    a[n(aCjiyi)-(ZKjj>)     (Zy^l/n 

Zx2i2  Sx2iYi  Ayy=[n(Zyi2)-(Zy.)2]/n 

ZYi2  B22=[A11A22-A122]/A1 


a  l 


A,  ,  A  A    „  B      =[A      A      -A      A      ]/A 

11  12  iy  2y      L     1 1      2y        121yJ/ll 


B22  B2y 


y  yy       * 1   yy     ly 

De=  [Bo  ,B      -B,     '" 
yy  L    2  2    yy      2y 


A22  A2  By    -[AMA      -A^'J/Au 


B  S    2=      -1-     -   v    e.2=n    An.3) 

°yy  ae        n-3      ~       i      ueAn  ^; 


De  a    =B      /B 

2  2y'        2  2 


a  a  a 

0  1  2 


Se2  ai  =  fAiv-a2Ai2^Aii 


a0  =  [  (Zyi)-a2  (  xai)-ax  (ZxlL)  ]  /n 


Staticians'  Solution 


Assumptions  A4  and  A5  lead  to  the  following 
random  variables  and  their  distr iubtions 

E  =  observational  error  in   y   at   (x  ,x„) 

~  N(0,a2) 
Y(xxx  )  =  y(x,  x2)  +E 

-  N(y(xlrx2)  ,  a2) 

A2=B2y/B22 

~  N(*2'  A   A"-A   2  ^ 

11   2  2    12 
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A    =  [A       -A   A       ]/A 

1        iy     212        11 


Also 


A 


"   V 


/a     a     -a2       / 

11       11  12 

A0  =  [  (2yi)-A12x1i-A2Zx2i  ]/n 

~      N(a0,aVn) 


Cov    (A0/Ai)=Cov     (A0,AJ=0 


Cov     (AWAJ  = 


-A 


j^L. 


"  i  i  "■  z  i.  "A     x  z 


Then    for    a    predicted    value    Y{XL,XZ)     at    any    point 
have 


(xlfXx)    we 


2       _ 


n  AUB!Z 


*st  * 


xx*-2 


A^ 


AWB,, 


X'X*    +    l  A.  ,  B. 


I    i    "Li. 


\x^ 


This    together    with 

uy=        E(y)  =y(xx  ,x2) 
and    the    fact    that 


Y(X,  ,X2)    -    N(nj,a*) 

can  be  used  to  establish  confidence  intervals  for  y(x.,xj 

CAUTION:  In  deriving  these  formulas  it  was 
assumed  that  xA  =  x2  =0.  For  data  in  which  this  shift  has  not 
been  made,  the  formulae  must  be  adjusted. 

Quadratic  Model 

The  quadratic  mode 

?=  ag+diX+a^x* 

can  be  transformed  into  a  linear  model  with  two  predictors 
by  the  transformation 


x1=x 


,         X^=X' 
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A-3.       LINEAR  LEAST  SQUARES  WITH  THREE  PREDICTORS 

Sample: 

(x  ..x  .,x  •  ,  x.  )  i=l, 2 , . . . ,n 
11    21 '   31  '  1 

Assumptions : 

Al  —  y  (Xi  ,  x2  ,xj )  =a0+a1x1+a2x2+a3x3 

A2  —  xi=Si   i=l/2,3 

A3  —  yi  =  y(x1,xixj)+ei 

Computational  Format 
n   Ex      Zx        Ex.      ZYi  Auy=  [n  (Zxuixvi)  -  (2^.)  (Sx^] 


Zx*  •    Zx  •  x  •   Zx  ■ x  •    x  -y. 
1  1       1  l  z  i     l  l  3l      iiJi 


u, v=l ,2,3 


Zx    -2  Sx2.x3i       Zxzi7i         A      =[n(Zxuiyi)-(Zxui)(Zyi)]/ 


Zx3iz  Zx3i    i 


Aw 

Ai  2 

A  l  3 

A, 

xy 

A2  2 

A2  j 

A2 

y 

A3  3 

a3 
3y 

A 

yy 

B2  2 

B2  3 

Bay 

B3  3 

Byy 

C33 

C3Y 

C 

yy 

De 

Zy2  •  B       =(AX1A      -A      A      )  /A,  . 

■*      1  UV  *      UV  1U      IV  i  l 


C      =(B22B      -B      B      )/B, 

UV  2  2      UV  2U      2V  2  2 


De=(C33Cyy-C3y2   )/C,, 


Se=D8/(n-4)=  ^  Z    e. 


A3=C3y/C33 


A2=(B2y-a332y)/B22 


2 
a<j    a  1  a2  33  Sg 
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a    =(A      -a    A      -a    A      ) /A 

1  lY  3       13  2       12  11 

a    =(2y.-a    Zx    .-a    Zx    .-a    Zx    . ) /n 

0^1  3  31  2  21  1  H 


Y. (x    ,x    ,  x    ) =a    +a    x   +a    x   +a    x 

12  3  0  112      2  3       3 


Statistics 


y(x    ,x    ,x    )=A    +A    x    +A    x    -tA    x 

123  0112233 

=Prediction  Equation 

It  can  be  seen  that  the  A.'s,  and  hence 
?(x  ,x  ,x  )  are  norinally  distributed.  Determining  their  means 
and1  variances  is  quite  mathematically  involved  and  will  be 
delayed  until  the  vector  solution  is  considered. 

Cubic  Polynomial 

y(x)=a  +a  x+a  x2+a  x3 

0    12       3 


Transformation 


2  3 

x  -x  /  x  -  X   /X-X 

1  2  3 

v  (x)  =a  +a  x  +a  x  +ct  x 

0    112   2    3   3 
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LINEAR  LEAST  SQUARES  WITH  k  PREDICTORS 
Sample  Data: 


(xu,...xki,yi) 


i=l, . . . , n 


Linear   Model 


Y=a   +a   x  +. . .+a   x 

Oil  K     K 


Prediction: 


y=a   +a   x  +. . ,+a.  x 

Oil  K     K 


Computational  Format 


Zx2i  •••   Ixki 


^a 


Ix2ixki 


Sxki2 


syj 


Zx^  X2i.  ..Sx,j  xki   Sxji  yt 


Zx2iyi 


Ex,  •  y  • 

ki J  i 


1  1 


1  2 


2  2 


A 


ik 


l2k 


■!y 


kk 


kky 


yy 


A2  .3  3 


A2  •  2k 
A2  .3k 


A2  .2, 
A2.3, 

A2  .  y  y 
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3-33 


3*31, 


3    » 
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3  •    k    i 


lk«  k  k 


Ak  •  k    , 


A  A 

o  1 


k— i 


Se2      =   De/(n-k-l) 
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LINEAR  LEAST  SQUARES  IN  VECTOR  FORM 


This  will  be  presented  for  k  predictors  (x-, 
,  ...,x,  ).  Let  the  sample  data  be  (x  . .  ,x  _.  ,  x,  .  ,y.  )  with 
i=l,....n.   This  data  can  be  presented  as  a  vector  y1  and  a 


matrix  x  where 


and   x  = 


Now  define  vectors  «,  x  and  a  as 


a 


a 


<V 


X  = 


and  a  = 


where  x  =1.   The  linear  model  then  takes  the  form 
o 

Y=Y(x  ,..,-%)  =aTx=   2   ax 

"*"  T  -»■ 

where  a   denotes  the  row  vector  which  is  the  transpose  of  a,    i.e., 

aT  =  (a  ,a  , . . . ,a.  ) 
o   i       k 

The  fitted  equation  are 

k 


Y=Y(x  ,.. ,xw)=a  x=  2  a.x 


i  =  0 


]  3 


where  the  a.'s  are  established  to  minimize 
] 


with 


D=£e 


ei=yi-y(xii,...,xki)=yi-  Z  ajXji 


k 
I 

j  =  0 


In  vector  form,  we  have 
e  =  y-xa 


A- 14 


so  that 

D=e  e 
The  normal  equations  (to  minimize  D)  are 

xT  x  a=xT  y 

with  the  solutions 

a=  (jTt  x)"1  xt  y 

Expressing  the  coefficients  in  terms  of  random 
variables,  we  have 


where 


with 


X={xT'xrl    x^ 


T  ^vT  xpT 


Yf  =(Y  ,  .  .  .  ,Y J  =(Y+E)'  =Y'  +E 


n 


using 


E  —  ( E  ,  .  .  .  ,  E  ) 

1       n 
^j    ^       —    *■>•■+■  T 
y  =(y  r .. . /Yd) =(xa) 

1 

Y=y+E 

i(E)=0 

g(E  E  )  =I»Z 
where  I  is  the  nxn  identity  matrix,  we  have 

g(Y)=Y 

£(YY  )=l<r2+yy 
and  hence  the  covariance  matrix  for  Y  is 

Cov  (Y,Y  )  =  £(YY  r)  -yy'  =  ]>2 
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Then 

E(A)=(x  x)  l  x  |(Y) 
= (x  x)   x  xa=a 

Thus  a  provides  unbiased   estimates  for  the  elements  of  a. 

For   the   variances   and   covariances   of   the 
coefficients  we  have 

Cov  (A, A  ) =(x  x) l    a2 

Finally,  for  Y  at  any  x  we  have 

g(Y)=y 


and 


ai  =  x  Cov  (A, A  )  x 


-1  ,„T  2 


=  x  ( x  x)  xa 
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A  P  E  N  D  I  X  B 
SAMPLE      LEAST     SQUARES     CALCULATION 

B-l.  STRAIGHT  LINE  REGRESSION  FOR  SAMPLE  II 

i              t';                   x'  t.  x.  x.  e  . 

1                            1  1  1  1  XI 

1  872              2183.2  -5  -371.2  -405.9  +34.7 

2  873              2241.3  -4  -312.6  -324.7  +12.1 

3  874             2305.5  -3  -248.9  -243.5  -5.4 

4  875              2377.1  -2  -177.3  -162.3  -15.0 

5  876              2451.6  -1  -102.8  -81.2  -21.6 

6  877              2533.8  0  -  20.6  0.0  -  20.6 

7  878              2619.6  1  65.2  81.2  -  16.0 

8  879              2707.7  2  153.3    ,        162.4  -   9.1 

9  880              2799.6  3  245.2  243.6  +    1.6 

10  881              2391.6  4  337.2  324.3  +12.4 

11  882              2987.3  5  432.9  406.0  -26.9 

SUM  0  0.4  0.4  0.0 

28098.8 

-      1 

x  =  nZx.  =2554.44  x^x^x 


n=ll         2^=0  Zx^O.4 

Zt?    =110  2tix.=8/931.2 

2x^=728,868.12 


A      =110  A      =8,931.2 


i  i 


1  x 

A      =728,368.11 
xx 
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Aee=3719.62 
2 


A    =0.04  A   =81.19  S   e=Aee/(n-2}=413.29      Se  =  20.33 


X(t) =a    +   a    t=0. 04+81. 19t 

o  1 


B-2.      QUADRATIC  REGRESSION  FOR  SAMPLE  n 


t  ,=t. 

ll    1 

t   .=t.2 
2i     i 

x. 

l 

A 

x. 

1 

e  . 

XI 

-  5 

25 

-  371.2 

-375.1 

+  3.9 

-  4 

16 

-  312.6 

-312.4 

-  0.2 

-  3 

9 

-  248.9 

-  245.6 

-  3.3 

-  2 

4 

-  177.3 

-174.7 

-  2.6 

-  1 

1 

-  102.8 

-  99.7 

-  3.1 

0 

0 

-    20.6 

-  20.5 

-0.1 

1 

1 

65.2 

+  62.7 

+   2.5 

2 

4 

153.3 

+  150.1 

*■  3.2 

3 

9 

245.2 

+241.6 

+  3.6 

4 

16 

337.2 

+337.1 

+  0.1 

5 

25 

432.9 

+436.8 

-   3.9 

SUM     0 

110 

0.4 

0.3 

0.1 

n=ll                     St    .=0                  St    .=110  SX.=0.4 

U                                             2l  1 

St    .  2=110            St    -t    .=0  St    .X. =8,931.2 

11                                             11       21  ll      1 

St    .  2=1958  St    .X. =1769.2 

21  21      1 

SX.  2=728,868.1 
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A   =110 
1  1 


A   =0 

1  2 

A   =858 

2  2 


A   =8,931.2 
i  * 

A   =1765.2 

2X 
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Axx=728,868.11 


A     =858 

2,2  2 


A   .  =1765.2 
2,  2x 

A     =3719.62 

2,  XX 

Aee=87.999 


a  =-  20.53     a  =81.19 

u  i 


a  =2.057 

2 


S^e=Aee/8=11.00 


X(t) =-20. 53+81. 19 t +2.0 57 t 


Se=3.32 


B-3        CONFIDENCE  INTERVALS  FOR  ESTIMATES 

In   Appendix   A,   it   was   shown   that   the 
confidence  intervals  for  x(t)  at  any  time  t  had  the  form 

(£(t)-c.  (t)s-e,£+Cj  (t)se)       j=i,2 

where 


:i(t»=  kyH  +  itr 


2 

T 
1 


2/n     AUA2.22   l     ^A11A2.22>/  lV  2   2  \AliA2.22y|v  2 


are  the  appropriate  terms  for  the  linear  and  quadratic 
regression  curves,  respectively.  For  a  95%  confidence  level 
and  n-2  or  n-3  degrees  of  freedom  for  the  Student-T 
distribution  we  find  k^  =1.833  and  k2=1.860.  Introducing  the 
numerical  values  determined  in  the  preceding  sections  of  this 
appendix,  we  find 


Vt)  =1.833  yij  ♦  jft 

C2(t)-1.860|/k  +  I^+  <t28^0'2 

The  relationships  of  C^t)  and  C  (t)  and  the  increments  S   C  (t; 

and  S,  C,(t)  are  shown  below  using  S   =20.33  and  S   =3.32.e 
2  e  2  j      i  e  2  e 

t Ci  (t) C?  (t) Sie  Ct  (t)   S7g  Ce  (t) 


0 

.553 

.847 

11.24 

2.81 

+ 

1 

.580 

.820 

11.79 

2.72 

+ 

2 

.654 

.765 

13.30 

2.54 

+ 

3 

.762 

.776 

15.49 

2.58 

+ 

4 

.891 

.981 

18.11 

3.26 

+ 

5 

1.034 

1.417 

21.02 

4.70 
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The  confidence  interval  for  x(t)  is  shortest  at 
t=0  (the  sample  midpoint)  for  the  linear  regression.  To  find 
the  value  of  t  in  the  quadratic  regression  for  which  the 
confidence  interval  is  shortest,  consider 


z  = 


11 


110 


(t2-10) 
858 


now 


dz 
dt: 


1   .  2(t2-10) 
110       853     u 


220  t2=2200-858=1342 

t2=6.10 

t=2.47 


C  (2.47) =0.7535 

2 
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SpC(t) 


SleCl(t) 

Linear 


S2eC2(t) 
Quadratic 
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